-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[core] Implement RecurseCountsThreadLocal
to be used in gCoreMutex
#19799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Test Results 16 files 16 suites 2d 13h 47m 55s ⏱️ For more details on these failures, see this check. Results for commit d7797bc. ♻️ This comment has been updated with latest results. |
c28e866
to
bc59f1a
Compare
tbb=ON
RecurseCountsThreadLocal
to be used in gCoreMutex
bc59f1a
to
871b9f9
Compare
I'm confused, doesn't RecurseCountsThreadLocal still depend on tbb::enumerable_thread_specific in this case? |
The `RecurseCountsThreadLocal` makes the assumption that only two instances are ever created per process. Therefore, its implementation can be optimized to the point that it *should* be as fast or faster than `RecurseCountsTBBUnique`, avoiding the Core dependence on TBB without compromising performance.
As described in root-project#19798, we have an old build option that was supposedly removed, but it is still considered in the implementation of `ROOT::gCoreMutex`. It would be good to know what happens when testing ROOT with it (no differences are expected from the performance optimization with TBB, and actually it was always enabled when building with `builtin_tbb=ON` as well, as we do for macOS and Windows).
871b9f9
to
d7797bc
Compare
Oh, pardon I forgot to remove the |
So there's a maybe longstanding issue here related to bd1894b and discussed in #6919 (comment) Given that this is similar to the old implementation in root/core/thread/inc/ROOT/TReentrantRWLock.hxx Lines 27 to 86 in 5028941
|
Oh! I see. Interesting |
I wonder how TBB avoids this problem that Philippe described... |
The difference between The real challenge with this area is that seeing it work in most (all?) cases does not tell us we don't have the problem ... as I was unable at the time to deterministically reproduce the issue (see log bd1894b). At the very least, one should revert bd1894b (i.e. using the old RecurseCounts) to check if we see the problem. If we see the problem then we have some degree of confidence that having the alternative code not failing means that they might not have the same problem. If we do not see the problem then ... we can not tell :( |
In tlslock.tar.gz I got to the spirit of what I remember
on macos and linux. The process is actually stuck in the ROOT lock .... Updating the example by commenting out thread 2 is waiting for the lock:
while the other thread is waiting on the system lock
it is not clear whether the example is too convoluted. |
This second example does also show the essence of the problem without ROOT. Side note: I am guessing the example can be simply further (by removing one of the layer of libraries) |
One thing I'm not sure I understood here. Would it be sufficient to ensure that the thread_local variables are initialized during the loading of libThread (and/or in the subsequent creation of new therads)? (as opposed to potentially later at the first call to GetLocal()? |
it can not be done, can it? The official |
Yes, but in this case it could/should be initialized for all existing threads at library load time and for any additional threads immediately when the thread is created. (As opposed to when execution reaches some particular local scope, I think as in https://en.cppreference.com/w/cpp/language/storage_duration.html#Static_block_variables ) |
I can see how that would work for a static variable .. but I am confused on how it could work for a thread_local since there is an infinite (indeterminated) number of (upcoming) threads. |
Right, for threads created after the library is loaded I would assume there are some cases where the thread_local variables would be initialized immediately upon the creation of the thread. (but not necessarily for the thread_local block variable case I linked above) |
Humm ... intriguing ... We might indeed be able to use a 'thread_local` declared in the global scope rather than inside a function in order to avoid the issue. |
Yes, or possibly as a static thread_local class data member |
The
RecurseCountsThreadLocal
makes the assumption that only two instances are ever created per process. Therefore, its implementation can be optimized to the point that it should be as fast or faster thanRecurseCountsTBBUnique
, avoiding the Core dependence on TBB without compromising performance.